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Abstract 

Previous studies have shown that health 
reports in social media, such as Dai- 
lyStrength and Twitter, have potential for 
monitoring health conditions (e.g. adverse 
drug reactions, infectious diseases) in par¬ 
ticular communities. However, in order 
for a machine to understand and make in¬ 
ferences on these health conditions, the 
ability to recognise when laymen’s terms 
refer to a particular medical concept (i.e. 
text normalisation) is required. To achieve 
this, we propose to adapt an existing 
phrase-based machine translation (MT) 
technique and a vector representation of 
words to map between a social media 
phrase and a medical concept. We eval¬ 
uate our proposed approach using a col¬ 
lection of phrases from tweets related to 
adverse drug reactions. Our experimen¬ 
tal results show that the combination of a 
phrase-based MT technique and the simi¬ 
larity between word vector representations 
outperforms the baselines that apply only 
either of them by up to 55%. 


1 Introduction 

Social media, such as DailyStrengtfQ and Twit- 
teid, is a fast growing and potentially rich 
source of voice of the patient data about expe¬ 
rience in terms of benefits and side-effects of 
drugs and treatments dOConnor et ah, 2014| ). 
However, natural language understanding 
from social media messages is a difficult task 
because of the lexical and grammatical vari¬ 
ability of the language ( [Baldwin et ah, 20T3 

’http://www.dailystrength.org/ 
''http://twitter.com 


lOConnor et ah, 2014] ). Indeed, language un¬ 
derstanding by machines requires the ability 
to recognise when a phrase refers to a partic¬ 
ular concept. Given a variable length phrase, 
an effective system should return a concept 
with the most similar meaning. For exam¬ 
ple, a Twitter phrase ‘No way I’m gettin any 
sleep 2nite’ might be mapped to the medical 
concept ‘Insomnia’ (SNOMED:I9346200I), 
when using the SNOMED-CT dictio¬ 
nary (ISpackman et ah, I997||. The success of 


the mapping between social media phrases and 
formal medical concepts would enable an auto¬ 
matic integration between patient experiences and 
biomedical databases. 

Existing 


works. 


e.g. (Elkin et 

al., 20121 

Wang et al., 2009 

mostly 


focused on extracting medical concepts from 
medical documents. Eor example, Gobbel et 
al. (I2014I I proposed a naive Bayesian-based 
technique to map phrases from clinical notes to 
medical concepts in the SNOMED-CT dictionary. 
Wang et al. (120091) identified medical concepfs re¬ 
garding adverse drug evenfs in elecfronic medical 
records. On fhe ofher hand, OConnor ef al. (120141) 
invesfigafed fhe normalisafion of medical ferms in 
Twiffer messages. In parficular, fhey proposed fo 
use fhe Eucene refrieval engine^ fo refrieve med¬ 
ical concepfs fhaf could be pofenfially mapped fo 
a given Twiffer phrase, when mapping befween 
Twiffer phrases and medical concepfs. 

In confrasf, we argue fhaf fhe medical fexf nor¬ 


malisafion fask (Eimsopafham and Collier, 2015 I 


can be achieved by using well-esfablished phrase- 
based MT techniques, where we franslafe a fexf 
wriffen in a social media language (e.g. ‘No way 
I’m geffin any sleep 2nife’) fo a fexf wriffen in a 
formal medical language (e.g. ‘Insomnia’). In¬ 
deed, in fhis work we invesfigafe an effective adap- 
fafion of phrase-based MT fo map a Twiffer phrase 


'http://lucene.apache.org/ 


























to a medical concept. Moreover, we propose to 
combine the adapted phrase-based MT technique 
and the similarity between word vector representa¬ 
tions to effectively map a Twitter phrase to a med¬ 
ical concept. 

The main contributions of this paper are three¬ 
fold: 

1. We investigate the adaptation of phrase-based 
MT to map a Twitter phrase to a SNOMED- 
CT concept. 

2. We propose to combine our adaptation of 
phrase-based MT and the similarity between 
word vector representations to map Twitter 
phrases to formal medical concepts. 

3. We thoroughly evaluate the proposed ap¬ 
proach using phrases from our collection of 
tweets related to the topic of adverse drug re¬ 
actions (ADRs). 


2 Related Work 


Phrase-based 


MT 


models 


(e.g. (Koehn et al., 2003 Och and Ney, 20041) 


have been shown to be effective in translation 
between languages, as they learn local term 
dependencies, such as collocations, re-orderings, 
insertions and deletions. Koehn et al. (120031 1 
showed that a phrase-based MT technique 
markedly outperformed traditional word-based 
MT techniques on several benchmarks. In this 
work, we adapt the phrase-based MT technique of 
Koehn et al. (120031) for the medical text normali¬ 
sation task. In particular, we use the phrase-based 
MT technique to translate phrases from Twitter 
language to formal medical language, before 
mapping the translated phrases to medical con¬ 
cepts based on the ranked similarity of their word 
vector representations. 

Traditional approaches for creating word vec¬ 
tor representations treated words as atomic 


units (Mikolov et al., 2013b Turian et al., 2010). 


For instance, the one-hot representation used a 
vector with a length of the size of the vocab¬ 
ulary, where one dimension is on, to represent 
a particular word dTurian et al., 2010| ). Recently, 
techniques for learning high-quality word vec¬ 
tor representations (i.e. distributed word repre¬ 
sentations) that could capture the semantic sim¬ 
ilarity between words, such as continuous bags 
of words (CBOW) dMikolov et al., 2013b| ) and 


global vectors (GloVe) (Pennington et al., 2014), 


have been proposed. Indeed, these distributed 


word representations have been effectively ap¬ 
plied in different systems that achieve state-of- 
the-art performances for several NLP tasks, such 
as MT dMikolov et al., 2013aD and named entity 


recognition dPassos et al., 20l4| ). In this work, be¬ 
side using word vector representations to measure 
the similarity between translated Twitter phrases 
and medical concepts, we use the similarity be¬ 
tween word vector representations of the original 
Twitter phrase and a medical concept to augment 
the adapted phrase-based MT technique. 


3 Medical Term Normalisation 


We discuss our adaptation of phrase-based MT for 
medical text normalisation in Section 13.11 Sec¬ 
tion 13.21 introduces our proposed approach for 
combining similarity score of word vector repre¬ 
sentations with the adapted phrase-based MT tech¬ 
nique. 

3.1 Adapting Phrase-based MT 

We aim to learn a translation between a Twitter 
phrase (i.e. a phrase from a Twitter message) and 
a formal medical phrase (i.e. the description of a 
medical concept). For a given Twitter phrase phrt, 
we find a suitable medical phrase phrm using a 
translation score, based on a phrase-based model, 
as follows: 


translationscore{phrm\phrt) = p{phrm\phrt) (1) 


where p{phrm\phrt) can be calculated 
using any phrase-based MT technique, 
e.g. ( IKoehn et al., 20031 Och and Ney, 20041. 
We then rank translated phrases phrm based 
on this translation score. The top-A: translated 
phrases are used for identifying the corresponding 
medical concept. 

However, the translated phrase phrm may not 
be exactly matched with the description of any 
target medical concepts. We propose two tech¬ 
niques to deal with this problem. For the first 
technique, we rank the target concepts based on 
the cosine similarity between the vector represen¬ 
tation of ptiTm and the vector representation of the 
description of each concept descc- 


sirricosiphrm, descc) = 




phrm ' Fiescc 


II ^ IIKieaCc || 


( 2 ) 


where YpPr^ Vdescc the vector represen¬ 
tations of phrm and desCc, respectively. Any 
technique for creating word vector representations 

























(e.g. one-hot, CBOW and GloVe) can be used. 
Note that if a phrase (e.g. phrasem) contains sev¬ 
eral terms, we create a vector representation by 
summing the value of the same dimension of the 
vector representation of each term (i.e. element¬ 
wise addition). 

On the other hand, the second technique also 
incorporates the ranked position r of the trans¬ 
lated phrase phr^ when translated from the orig¬ 
inal phrase phrt using Equation ([Til. Indeed, the 
second technique calculates the similarity score as 
follows: 

simrcos{phrm,desCc) = - ■ ii (3) 

3.2 Combining Similarity Score with 
Phrase-based MT 

As discussed in Section |2l word vector represen¬ 
tations (e.g. created by CBOW or GloVe) can cap¬ 
ture semantic similarity between words by itself. 
Hence, we propose to map a Twitter phrase phrt 
to a medical concept c, which is represented with 
a description descc, by linearly combining the co¬ 
sine similarity, between vector representations of 
the Twitter phrase phrt and the description desCc, 
with the similarity score computed using one of 
the adapted phrased-based MT techniques (intro¬ 
duced in Section [3T]) . as follows: 

/II \ ^phrt ■ ^descc 

StUlf^ombineyP^'^t-) dcSCc) — ||,, ,, i, 

11 ^phrt 11 ^ 11 'desCc 11 

(4) 

-h MTa{phrt,desCc) 

where MTa{phrt, desCc) is calculated using one 
of the adapted phrase-based MT techniques de¬ 
scribed in Section im 


4 Experimental Setup 
4.1 Test Collection 


To evaluate our approach, we use a collection of 
25 million tweets related to adverse drug reactions 
(ADRs). In particular, these tweets are related 
to cognitive enhancers dHanson et ah, 201^ and 
anti-depressants dSchneeweiss et ah, 2010] | that 
can have adverse side effects. We use 201 ADR 
phrases and their corresponding SNOMED-CT 
concepts annotated by a PhD-level computational 
linguist. These phrases were anonymised by re¬ 
placing numbers, user IDs, URIs, locations, email 
addresses, dates and drug names with appropriate 
tokens e.g. _NUMBER_ 


4.2 Evaluation Approach 

We conduct experiments using 10-fold cross val¬ 
idation, where the Twitter phrases are randomly 
divided into 10 separated folds. We address this 
task as a ranking task, where we aim to rank the 
medical concept with the highest similarity score, 
e.g. calculated using Equation (|2]l, at the top rank. 
Hence, we evaluate our approach using Mean Re¬ 
ciprocal Rank (MRR) measure dCraswell, 2009| ), 
which is based on the the reciprocal of the rank 
at which the first relevant concept is viewed in the 
ranking. In addition, we compare the significant 
difference between the performance achieved by 
our proposed approach and the baselines using the 
paired t-test (p < 0.05). 

4.3 Word Vector Representation 

We use three different techniques, including one- 
hot, CBOW and GloVe, to create word vector 
representations used in our approach (see Sec¬ 
tion [3]l. In particular, the vocabulary for creating 
the one-hot representation includes all terms in the 
Twitter phrases and the descriptions of the target 
SNOMED-CT concepts. Meanwhile, we create 
word vector representations based on CBOW and 
GloVe by using the word2ve(j3 and GloVe@ imple¬ 
mentations. We learn the vector representations 
from the collections of tweets and medical arti¬ 
cles, respectively, using window size of 10 words. 
The tweet collection (denoted Twitter) contains 
419,702,147 English tweets, which are related to 
11 drug names and 6 cities, while the medical ar¬ 
ticle collection (denoted BMC) includes all med¬ 
ical articles from the BioMed Central^ Eor both 
CBOW and GloVe, we create vector representa¬ 
tions with vector sizes 50 and 200, respectively. 


4.4 Learning Phrase-based Model 

We use the phrase-based MT technique of Koehn 
et al. (1200^ . as implemented in the Moses 
toolkit dKoehn et ah, 20071 with default settings, 
to learn to translate from the Twitter language to 
the medical language. In particular, when train¬ 
ing the translator, we show the learner pairs of 
the Twitter phrases and descriptions of the corre¬ 
sponding SNOMED-CT concepts. 


^https://code.google.com/p/word2vec/ 

^http://nlp.stanford.edu/projects/glove/ 

^http://www.biomedcentral.com/about/datamining 

















Table 1: MRR-5 performance of the proposed approach and the baselines. Significant differences 
(p < 0.05) with the cosine similarity (vSim) baselines with the one-hot representation, and with the 
corresponding distributed word representation (e.g. CBOW or GloVe) are denoted and respectively. 


Approach 

One-hot 

BMC 

Twitter 

CBOW 

GloVe 

CBOW 

GloVe 

so 

200 

so 

200 

so 

200 

50 

200 

vSim 

0.1675 

0.1771 

0.1896 

0.1840 

0.1869 

0.1812 

0.1813 

0.0936 

0.1807 

bestMT 

0.2232 

0.1926 

0.2070 

0.1803 

0.2500^ 

0.2014 

0.2047 

0.1258 

0.2138 

top5MT 

0.2491“^ 

0.1994 

0.2104 

0.1879 

0.2638^* 

0.2037 

0.2095 

0.1322 

0.2362 

topSMTr 

0.2458^ 

0.1982 

0.2109 

0.1894 

0.2617^ 

0.2037 

0.2096 

0.1322 

0.2310 

bestMT+vSim 

0.2420^ 

0.1910 

0.1953 

0.1860 

0.2532^ 

0.1891 

0.1954 

0.1078 

0.2374 

topSMT+vSim 

0.2556'^ 

0.1916 

0.2144 

0.1726 

0.2600^ 

0.1978 

0.2068 

0.1079 

0.2405'^ 

topSMTr+vSim 

0.2594'^ 

0.1861 

0.2070 

0.1802 

0.2590^ 

0.1959 

0.2027 

0.1129 

0.2406'^ 


5 Experimental Results 

We evaluate 6 different instantiations of the pro¬ 
posed approach discussed in Section [3l including: 

1. bestMT: set A: = 1, when finding fhe frans- 
lafed phrase phrm for a Twiffer phrase phrt 
(Equation ([Til), before ranking fargef medical 
concepfs for fhe franslafed phrase phrm using 
Equation ([2ll. 

2. topSMT: similar fo bestMT, buf sef k = 5. 

3. topSMTr similar fo topSMT, buf also con¬ 
sider fhe rank position of fhe franslafe phrases 
when ranking fhe fargef medical concepfs by 
using Equation (O. 

4. bestMT+vSinr incorporafe wifh fhe ranking 
generafed from bestMT, the cosine similar¬ 
ity between the vector representations of the 
Twitter phrase phrt and the description descc 
of target medical concepts by using Equa¬ 
tion dU). 

5. topSMT+vSim\ smalw to bestMT+vSim,h\A 
use the ranking from topSMT. 

6. topSMTr+vSim: similar to bestMT+vSim, but 
use the ranking from topSMTr. 

Another baseline is vSim, where we consider only 
the cosine similarity between the vector represen¬ 
tations of the Twitter phrase phrt and the descrip¬ 
tion desCc of target medical concepts. 

Table [T] compares the performance of these 6 
instantiations and the vSim baseline in terms of 
MRR-5. We firstly observe that for the vSim base¬ 
line, excepting for word vector representation with 
vector size 50 learned using GloVe from the Twit¬ 
ter collection, word vector representations learned 
using either CBOW or GloVe are more effective 
than the one-hot representation. However, the dif¬ 
ference between the MRR-5 performance is not 
statistically significant (p > 0.05, paired t-test). In 
addition, word vector representations learned ei¬ 


ther using CBOW or GloVe with vector size 200 
is more effective than those with vector size 50. 

Next, we find fhaf our adapfafion of phrase- 
based MT (i.e. bestMT, topSMT and topSMTr) 
significanfly (p < 0.05) oufperforms the vSim 
baseline. Eor example, with the one-hot repre¬ 
sentation, topSMT (MRR-5 0.2491) and topSMTr 
(MRR-5 0.2458) perform significantly (p < 0.05) 
better than vSim (MRR-5 0.1675) by up to 49%. 
Meanwhile, when using word vector represen¬ 
tations with the vector size 200 learned using 
GloVe from the BMC collection, topSMT (MRR-5 
0.2638) significantly (p < 0.05) outperforms vSim 
with both the GloVe vector representation (MRR- 

5 0.1869) and the one-hot representation (MRR-5 
0.1675). We observe the similar trends in perfor¬ 
mance when using vector representations learned 
from the Twitter collection. These results show 
that our adapted phase-based MT techniques are 
effective for the medical term normalisation task. 

In addition, we observe the effectiveness 
of our combined approach (i.e. bestMT+vSim, 
topSMT+vSim and topSMTr+vSim), as it further 
improves the performance of the adapted phrase- 
based MT (i.e. bestMT, topSMT and topSMTr, re¬ 
spectively), when using the one-hot representa¬ 
tion. Eor example, topSMTr+vSim achieves the 
MRR-5 of 0.2594, while the MRR-5 of topSMTr 
is 0.2458. However, the performance difference is 
not statistically significant. Meanwhile, when us¬ 
ing the CBOW and GloVe vectors, the achieved 
performance is varied based on the collection (i.e. 
BMC or Twitter) used for learning the vectors and 
the size of the vectors. 

6 Conclusions 

We have introduced our approach that adapts a 
phrase-based MT technique to normalise medical 

















terms in Twitter messages. We evaluate our pro¬ 
posed approach using a collection of phrases from 
tweets related to ADRs. Our experimental results 
show that the proposed approach significantly out¬ 
performs an effective baseline by up to 55%. For 
future work, we aim to investigate the modelling 
of learned vector representation, such as CBOW 
and GloVe, within a phrase-based MT model when 
normalising medical terms. 
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Appendix 

Tables |2] and [3] report the MRR-5 performance 
when using the word vector representation learned 
from the BMC and Twitter collections with win¬ 
dow sizes 50, 100 and 200, using CBOW and 
GloVe. 


Table 2: MRR-5 performance of the proposed approach when the word vector representation created by 
CBOW and GloVe is learned from the BMC collection with window sizes 50, 100 and 200. Significant 
differences (j> < 0.05) with the cosine similarity with the one-hot representation, and the cosine similar¬ 
ity with the corresponding distributed word representation vector are denoted and respectively. 


Approach 

One-hot 

CBOW 

GloVe 

50 

100 

200 

50 

100 

200 

vSim 

0.1675 

0.1771 

0.1882 

0.1896 

0.1840 

0.1593 

0.1869 

bestMT 

0.2232 

0.1926 

0.1956 


0.1803 


0.2500-^ 

top5MT 

0.2491^ 

0.1994 

0.1971 

0.2104 

0.1879 

0.2425^* 

0.2638^* 

topSMTr 

0.2458^ 

0.1982 

0.1971 

0.2109 

0.1894 


0.2617-^ 

bestMT+vSim 

0.2420^ 

0.1910 

a 

0.1953 

0.1860 

0.2375^* 

0.2532-^ 

topSMT+vSim 

0.2556^ 

0.1916 

BBS a 

0.2144 

0.1726 

0.2381^* 

0.2600^ 

top5MTr+vSim 

0.2594^ 

0.1861 

0.1918 


0.1802 

0.2451^* 

0.2590^ 


Table 3: MRR-5 performance of the proposed approach when the word vector representation created by 
CBOW and GloVe is learned from the Twitter collection with window sizes 50, 100 and 200. Significant 
differences (j> < 0.05) with the cosine similarity with the one-hot representation, and the cosine similar¬ 
ity with the corresponding distributed word representation vector are denoted ^ and respectively. 


Approach 

One-hot 

CBOW 

GloVe 

50 

100 

200 

50 

100 

200 

vSim 

0.1675 

0.1812 

0.1901 

0.1813 

0.0936 

0.1836 

0.1807 

bestMT 

0.2232 

■MHIIEB 

0.1993 

0.2047 

0.1258 

0.2114 

0.2138 

top5MT 

0.2491^ 


0.2060 

0.2095 

0.1322 


0.2362 

top5MTr 

0.2458^ 


0.2037 

0.2096 

0.1322 

0.2279 

0.2310 

bestMT+vSim 

0.2420^ 

0.1891 

0.1959 

0.1954 

■nnKm 

0.2161 

0.2374 

top5MT+vSim 

0.2556^ 

0.1978 

0.2033 

0.2068 


0.2420^ 

0.2405^ 

top5MTr+vSim 

0.2594^ 

0.1959 

0.1913 

0.2027 


0.2352 

0.2406^ 


























































